Here, I discuss the design and analysis of systematic experiments for investigating the effect of various parameters and model design choices on the target outcome. The aim is to provide a comprehensive and reproducible analysis of the simulation results. The basis of the design and analysis of the systematic experiments described here is the instructions provided by Lorscheid et al. (2012) and Law (2015).

1 Design description

1.1 The objectives of the experiments

We aim to identify the factors that affect the focal model response, namely, the ratio of executed actions corresponding to the long-term goal (LGR) after the parental monitoring has lifted. More specifically, we wish to answer the following questions:

  1. How does changing the parental rules from strict to lax affect adolescents’ LGR?
  2. How would the link between LGR and the rules differ for different rule enforcement by the parents?
  3. How do adolescents’ characteristics interact with parental rules to impact their LGR?

1.2 Classiffication of variables

Table 1 lists the model’s independent, dependent, and control variables. The response variable of these experiments is the ratio of the executed long-term goal actions when there is no parental monitoring. This ratio has a lower bound of about the value of the Long-term Goal Preference Probability variable and an upper bound of 1 (the uncertainty is due to the randomness implemented in the goal selection process).



We will treat the control variables as separate factors to study their effects on the response variable (for a detailed description of these variables, refer to the model’s ODD).

1.3 Response variable and factors

As stated above, the response variable of these experiments is the ratio of the executed long-term goal actions when there is no parental monitoring. This ratio has a lower bound of about the value of the Long-term Goal Preference Probability variable and an upper bound of 1 (the uncertainty is due to the randomness implemented in the goal selection process). Therefore, to compare the outcome of different simulations with various LGPP levels, the difference between the resulting LGR and the corresponding LGPP is used as the final response variable, which we name LGRD. The LGR represents the agent’s self-control score, and therefore, we interpret the LGRD as this score’s improvement. The following equation computes the LGRD in the Response Steps:

\[ \small LGRD = \frac{Number\: of\:executed\:LG\:actions}{Total\:number\:of\:executed\:actions} - LGPP \]

Where the Total number of executed actions is equal to the Response_Steps (after the main body of the simulation ends, we continue the process for several steps determined by the constant Response_Steps. In these steps, we lift the parental rule and keep the state space of the adolescent agent constant. The aim is to provide a frozen state to compute the LGR). We count the number of LG actions the agent has performed within these steps and use it to calculate the ratio of the executed LG actions. On the other hand, the LGPP gives us the reference point for selecting LG actions.

Table 2 contains the independent and control variables and their corresponding ranges. The PP parameter determines the ratio of steps where the parent agent is present. We have chosen a low limit of 5% and a high limit of 80% to represent the span of parents’ active rule enforcement. Moreover, we set the LGPP range to lower than 0.5 to indicate a lower intrinsic preference for the long-term goal. The parameter SGLW captures the relative weight of the two criteria of action selection, the Goal Preference Probability (GPP) and the State-Goal Link (SGL). A higher SGLW parameter indicates a higher impact of the State-Goal link on the final score of each goal. The CM is a categorical variable comprising two categories, which we will refer to as the FC and the PC mode. Finally, the SD indicates the number of steps in the primary cycle of simulation where the parent agent sets and enforces the rules and the adolescent agent’s state space takes shape.



1.4 The factorial design

1.4.1 Defining the initial factor levels

We have conducted separate experiments for different levels of the factors R, LGPP, and CM. For each of these, we use a 23 factorial design on PP, SGLW, and SD to find the combination that results in higher values of LGRD. This configuration would give us eight design points per experiment. Note that we are not performing an optimization. Instead, we aim to obtain values that work reasonably better than others, which we name the Possible Best Combinations or PBCs of each experiment.

The initial factor levels, listed in Table 3, are similar for the different configuration of PR, LGPP, and CM.



1.4.2 Estimation of the experimental error

As the simulation incorporates random processes, the results of repeated runs would contain inevitable variability. Therefore, before starting the systematic experiments, we must estimate this variability or experimental error. To approximate the experimental error, we repeat the simulation with a fixed setting for an increasing number of runs and then compare the mean and variance of the outcomes. Table 4 lists the preliminary design points for estimating the error variance (EEV). These design points are selected to represent the combination of low, mean, and high levels of the factors (Note that the provided low and high levels for the EEV experiments are different from those introduced in Table 2. During the main experiments, we changed the range limits of the factors based, but because of time restraints did not repeat the EEV experiments. However, the overall results likely would not change much).



Using each preliminary design point, we repeat the simulation for a series of run numbers for three parental SG limits representing lax, medium, and strict rules. Next, we calculate the Coefficient of Variance (CV) for each design point’s outcome. The CV, computed as the mean divided by the standard deviation of the resulting LGRs, provides a dimensionless measure of the variance, which allows us to compare the effect of the number of runs on the simulation outcome variability. We increase the run numbers until the CV stops changing and the variability of the outcome stabilizes. The corresponding run number determines the times we should repeat each design point in the main experiment.

1.4.3 The Design matrix and the effect analysis

To simulate the experiment for all factor combinations, we need to repeat the process for 23 design points (Table 5).



The main effect of factor i, denoted as ei, is computed as the average change in the response when this effect changes from its \(\small -\) to its \(\small +\) level, while all other factors are held constant. For example, we could compute the main effect of factor 1 (PP) using the following equation:

\[ \small e_1 = \frac{(R_2 - R_1) + (R_4 - R_3) + (R_6 - R_5) + (R_{8} - R_{7})}{4} \]

There is a way to simplify this. We could use the level representation of the corresponding factor as the sign of the design point’s response and then add them all up accordingly. For example, in design point 4, factor 3 (SD) is represented by its \(\small -\) level, and therefore, R4 appears with a negative sign in the effect’s numerator. Using the same procedure for the other responses results in the following equation for the main effect of this factor:

\[ \small e_3 = \frac{-R_1 - R_2 - R_3 - R_4 + R_5 + R_5 + R_7 + R_8}{4} \]

We are also interested in the interaction of factors and its effect on the response variable. The interaction effect of factors i and j, is computed as the difference between the average effect of factor i when factor j is at its \(\small -\) level, compared to when it’s at its \(\small +\) level, while all other factors are held constant. For example, the interaction effect of factor 1 and factor 2 is as follows:

\[ \small e_{12} = \frac{1}{2} \left[\frac{(R_4 - R_3) + (R_8 - R_7)}{2} - \frac{(R_2 - R_1) + (R_6 - R_5)}{2}\right] \]

Again, we can simplify this process by using the level representation of the corresponding factors as signs. Here, the sign of each response would be the multiplication of the signs of each factor in the design point. For example, factor 1 is at its \(\small -\) level and factor 2 at its \(\small +\) level at design point 3. Multiplying \(\small -\) by \(\small +\) results in a negative sign, and therefore, R3 appears with a negative sign in the equation.

The above calculations will give us the main effects of the factors and their interactions in each of the simulation runs. As we have N runs for each design point, we end up with values representing the distribution of our response. We use this distribution to estimate a confidence interval likely to contain the target parameter.

As we have multiple effects, we need to take the Bonferroni inequality into account, which would be the following:

\[ \small P(\mu_s \in I_s \ \ \textrm{for all} \ s = 1, 2, 3, ...,k) \ge 1 - \sum_{s=1}^{k} \alpha_s \]

According to this inequality, the probability for the parameter \(\small \mu\) of all of the effects to fall within the corresponding confidence intervals simultaneously would be at least \(\small 1 - \sum_{s=1}^{k} \alpha_s\), where \(\small 1 - \alpha_s\) would be our confidence level. Therefore, as we have six effects in total (three main effects and three interaction effects), to have at least an overall confidence level of 90%, we need to consider a confidence level of 98.33% for each of our six effects.

1.5 The experiment procedure

The overall procedure of all experiments is as follows:

  1. Set the factor levels and create the design matrix (Systematic_Experiments_2^3.R)
  2. Run each design point 100 times (Systematic_Experiments_2^3.R)
  3. Compute the response summary and the effect matrix (Effect_Analysis_2^3.R)
  4. Choose the next factor levels based on the results:
    1. Follow the significant (S) effects and ignore non-significant (NS) ones
    2. If all effects become NS or some reach their upper/lower limits after the 5th iteration: terminate the experiment
    3. Else, do this:
      1. Come up with new combination(s) by considering the interactions (Interaction_Analysis_2^3.R)
      2. Compare the new iteration with the previous one (Compare_Iterations.R):
        • If it is not significantly better: terminate the experiment
        • Else: Go back to step 1

Initially, we set the factor levels to the values in Table 3 and execute each design point 100 times. We then store the LGRD of each run in an excel file and compute the main effects and interactions.

To choose the new factor levels, we leave the factors with non-significant main effects as they are and change the factor levels of those with significant effects accordingly. For example, if the SGLW factor shows a significant positive effect on the LGRD, we set the next levels of this factor around the previous high level. In the case of interactions, we consider the combining effects of the involved factors.

The experiment ends when all effects become NS or the significant effects reach their upper/lower limits. However, in cases where this happens before the 5th iteration, we will continue the search by moving toward suggested trends. Note that, as mentioned before, what we are doing here is not an optimization, and we are only trying to come up with a set of parameters that could probably result in higher LGRDs. Therefore, although this maximum for the number of iterations is arbitrary, we are trying to systematize the search for the PBCs by following similar rules for all experiments.

In case where all effects have become NS, we choose the lower level values of each factor for the PBC. However, when a factor reaches its upper boundary with a significant positive effect, its high level value enters the PBC. Likewise, when a factor hits its lower limit with a significant negative effect, we choose its lower level value for the PBC.

The comparison of step 4.III.b (between two iterations) includes these steps (Compare_Iterations.R):

  1. Check the significance of effects in the first iteration:
    • All NS: Pool all runs from all DPs into one sample
    • Some S: Pool all runs from significantly better DPs into one sample
  2. Repeat the first step for the second iteration.
  3. Compare the obtained samples of each iteration with Welch’s confidence interval

1.5.1 Considering an example

To better illustrate the process, let us go over the experiment for Rule 0.125, with LGPP = 0.1 and CM = PC_P_NC_A.



Running the first iteration with the initial factor levels listed in Table 3 results in the effect matrix of Table 7. In this table the significant effects are marked in bold. As you see, only the main effects of factor SGLW and SD are significant. Therefore, we choose new factor levels for the second iteration to be around the previous high levels of these two factors (look at the second row of Table 6).



In the second iteration, all effects became NS. We name these iterations FNS or Fully Non-Significant. As the FNS iteration has occurred before iteration 5, we continue the experiment (step 4.III of the procedure) and set the new levels by considering the interaction between factors (see Fig. 1, Fig. 2, and Fig. 3). Note that all effects, including the interactions, are NS in this iteration. However, we use them as hints or trends pointing at the combinations with possibly better LGRD.


Figure 1: Interaction plot of factors PP and SD at the second iteration of rule 0.125 experiment (created using the code Fig_PPxSD.R)


Figure 2: Interaction plot of factors PP and SGLW at the second iteration of rule 0.125 experiment (created using the code Fig_PPxSGLW.R)


Figure 3: Interaction plot of factors SD and SGLW at the second iteration of rule 0.125 experiment (created using the code Fig_SDxSGLW.R)


As there is no reason to assume that the two samples have equal variances, we use Welch’s t-test to compare them. All the effects in the third iteration (third row of Table 6) are also NS. Therefore, to see whether this iteration is significantly better than the previous one, we aggregate the results of all design points in each iteration and compare their mean.

Performing Welch’s t-test for iterations 2 and 3 of rule 0.125 results in a CI = [-0.006, -0.002], meaning the 2nd iteration, chosen as the reference sample, has a significantly lower mean than the 3rd. Therefore, we can continue the experiment with these new factor levels. As all effects are NS, we should consider the interactions and hints for choosing the next iteration’s factor levels.

Now, the fourth iteration (fourth row of Table 6) results in a significant interaction effect between factors PP and SD (Fig. 4. Based on the design matrix (Table 5), the 6th and 8th design points are those where both PP and SD are at their high levels. Therefore, we aggregate the LGRD values of the 100 runs of these design points and consider them the new sample. The old sample would be the aggregate of all eight design points of iteration 3 (this iteration is an FNS, and therefore, its design points do not differ significantly). Performing Welch’s t-test for iterations 3 and 4 of rule 0.125 results in a CI = [-0.01, -0.002], meaning the 3rd iteration, chosen as the reference sample, has a significantly lower mean than the 4th. Again, we can continue the experiment based on these factor levels. The search ends when the 6th iteration becomes an FNS, and the PBC would be PP = 0.7, SGLW = 5.5, and SD = 1584.


Figure 4: Interaction plot of factors PP and SD at the fourth iteration of rule 0.125 experiment (created using the code Fig_PPxSD.R)


Note that in cases where the 5th iteration is also an FNS but has a significantly higher mean, the experiment terminates, and we choose the low levels of this 5th iteration as the final combination. On the other hand, if the mean does not differ significantly, we discard the new iteration and choose the PBC from the previous one.

2 Results

2.1 Estimation of the experimental error

To estimate the error variance, we run the EV_Estimation.R code for Rule = 0.875 (lax), Rule = 0.5 (medium), and Rule = 0.125 (strict) (Table 8, Table 9, and Table 10, respectively), with run numbers equal to 100, 200, 300, 400, 600, and 800. The data of these experiments are available in SimulationData directory.





As evident from the results, the coefficient of variance remains relatively stable from 100 to 800 runs. Therefore, to keep the time and hardware burden of the simulation at a minimum, we choose to repeat each design point 100 times.

2.2 Systematic experiments

2.2.1 Differences in LG action distribution

As discussed in the introduction of the main article, according to WTT (Fleeson & Jayawickreme, 2015), although people’s actions during their day-to-day activities show unquestionable variability, we could observe stable patterns in these actions that explain between-person differences. In other words, the individual’s actions span a distribution, the mean and standard deviation of which differs from person to person. In this model, the agent’s choices in every step represent its state. Moreover, we could interpret the mean of these states’ distribution as the agent’s trait. Here, we aim to show two differences in this mean, one within a single agent and the other between two agents (the data used to create this section’s figures, stored in SimulationData/RM_SingleExp_WTT.xlsx file, result from running the code Single_Experiment_ForWTT.R).

In Fig. 5, we show the distribution of the first and the last third of the simulation using the combination of parameters R = 0.5, PP = 0.2, LGPP = 0.1, SGLW = 10.0, CM = FC_P_PC_A, and SD = 1680. To create this figure, we run the PBC of rule 0.5 one hundred times and store the LGRD of the first and last 312 steps. This number of steps is the constant Response_Steps, where we calculate the LGRD of all experiments. The aim is to compare a single agent’s initial and final conditions in diverse situations created by different runs. As you see, despite the overlapping states, the agent’s trait, i. e. the mean of the distribution, has improved over time.


Figure 5: The distribution of the model response in the initial and last days of the simulation for a single agent with the following combination of parameters: R = 0.5, PP = 0.2, LGPP = 0.1, SGLW = 10.0, CM = FC_P_PC_A, and SD = 1680 (created using the code Fig_WWTT.R)


Moreover, to examine the between-person differences, we plot the distribution of two combinations, which we interpret as representing two agents with different conditions. The first set of parameters represents an adolescent with parents who impose strict rules, and the other exemplifies a youth with parents who have medium expectations. Again, the results, depicted in Fig. 6, show a distinction in the distribution means, indicating a higher trait score for agent two.


Figure 6: The distribution of the model response for two agents, i. e. combination of parameters. The parameter combination of agent one: R = 0.125, PP = 0.8, LGPP = 0.1, SGLW = 6.0, CM = FC_P_PC_A, and SD = 1392, The parameter combination of agent two: R = 0.5, PP = 0.2, LGPP = 0.1, SGLW = 10.0, CM = FC_P_PC_A, and SD = 1680 (created using the code Fig_BWTT.R)


2.2.2 Examining the outcome of different rules

The main pattern we aim to investigate with our model is the link between choosing to execute LG actions when parental monitoring has lifted and the spectrum of parental limit-setting. Using the steps described in the methods section, we specify the PBCs for each rule while setting the LGPP to 0.1, 0.2, 0.3, and 0.4 and the CM to PC_P_NC_A. We choose nine ratios from the range [0:1] to serve as rules, namely 0.001, 0.045, 0.125, 0.25, 0.33, 0.5, 0.67, 0.75, and 0.875. These ratios span the range of strict to lax rules by allowing the adolescent agent to execute 0, 1, 3, 6, 7, 12, 16, 18, and 21 SG actions within every 24 steps. The complete data of these experiments are available in SimulationData/LGPP-CM.

Fig. 7 shows the LGRD vs. rules for different LGPP levels in the PC compliance mode. The resulting graph appears as an inverse u-shape for LGPP levels of 0.1 and 0.2. In the case of higher LGPP levels, the LGRD does not vary much between different rules, and the graph appears flat. Moreover, the gain is higher for lower LGPP levels. In other words, those with already high LGPP values benefit less from parental rules than those with lower LG preference probabilities. However, all results have confidence intervals above zero, suggesting a significant positive impact of parental limit-setting on the adolescent’s likelihood of executing LG actions.


Figure 7: Self-control improvement (LGRD) vs. parental rules for different LGPP levels in the PC_P_NC_A compliance mode (created using the code Fig_LGPP-PC.R)


In cases of higher LGPP levels, the lax parental SG limits do not require much more than what the adolescent agent already performs. Moreover, when the rule becomes strict, the compliance rate drops and the agent does not execute extra LG actions. Whereas in the case of LGPP = 0.1, the higher gap between the rule and the adolescent’s intrinsic motivation to select LG actions leads to higher rates of executing LG. So long as these extra LG actions add to the LG attractors, they contribute to the higher probability of choosing LG actions in the future. When the addition to LG attractors drops, i.e. Perceived Autonomy (PA) decreases, the gain in selecting LG actions also declines.

The link between parental rule and LGRD for different LGPP levels of the FC mode shows a different pattern (Fig. 8). In this mode, when using each rule’s PBC, we observe a steady increase in the LGRD as the rules become strict for higher LGPP levels. This increase is also evident for the three most restrictive rules in the LGPP = 0.1 case. Such strange behavior is due to the high level of PP in these rules’ PBC. As the FC mode enjoys full compliance in the parent agent’s presence, higher PP levels in this mode mean higher compliance rates. However, the PA also drops for strict rules and high PP probabilities. Still, as the rules become strict, the adolescent agent has fewer and fewer chances of executing SG actions. Therefore, although the forced LG actions do not contribute to LG attractors, the lack of SG actions leads to shallower SG attractors, which tips the selection process toward the LG ones. This process leads to higher LGRD for high LGPP levels and strict rules, as in these cases, although the PA is low, the agent is more likely to execute self-selected LG actions, which offer the highest addition to LG attractors. Combining these deeper LG attractors with the severe limitations on SG actions leads to higher LGRDs in the cases of higher LGPP levels.


Figure 8: Self-control improvement (LGRD) vs. parental rules for different LGPP levels in the FC_P_PC_A compliance mode (created using the code Fig_LGPP-FC.R)


Lowering the PP probability in the case of higher LGPP levels (Fig. 9) results in an inverse u-shape pattern, as in this condition, the strict limit on SG actions lifts, and the extra LG actions add little depth to the attractors because of low PA.


Figure 9: Self-control improvement (LGRD) vs. parental rules for LGPP = 0.4 in the FC_P_PC_A compliance mode, each rule with their specific PBCs and all with the combination PP = 0.05. SGLW = 10.0, and SD = 1920 (created using the code Fig_SCDC-FC-0.4.R)


As mentioned, the specific combinations assigned to each rule do not result in the highest possible LGRD values, and arriving at such combinations requires an optimization-based search of the parameter space. However, using a systematic procedure, we have tried to find combinations that work reasonably well. Still, running all rules with similar combinations of parameters in the PC mode does not change the overall pattern of the results (Fig. 10). On the other hand, as discussed above, the results of the FC mode are sensitive to the value of PP probability when LGPP is high.


Figure 10: Self-control improvement (LGRD) vs. parental rules for different LGPP levels in the PC_P_NC_A compliance mode with a similar combination of parameters of PP = 0.8, SGLW = 10.0, SD = 1920, for all rules (created using the code Fig_LGPP-PC.R)


2.2.3 Varying the presence probabilities

We also conducted experiments to investigate the link between the PP factor and the LGRD. To do this, we first fixed the LGPP to 0.1 and the CM to the PC mode and conducted 22 factorial experiments to identify the possible best combinations of the SGLW and SD factor levels for five levels of PP and lax, medium, and strict rules.


Figure 11: Self-control improvement (LGRD) vs. parental presence probability for lax, medium, and strict rules in the PC_P_NC_A compliance mode (created using the code Fig_PP.R)


As presented in Fig. 11, different levels of presence probability do not appear to significantly affect the LGRD in the lax and strict rule cases. On the other hand, in the case of the medium rule, too low and too high presence probabilities result in lower LGRDs than medium PP levels. In other words, these results appear to repeat the previously discussed inverse u-shaped association between the rules and the LGRD.

2.2.4 Comparing different compliance modes

As described in the ODD, we have implemented two compliance modes in this model, the FC_P_PC_A and the PC_P_NC_A, or shortly FC and PC. Again, we set the LGPP to 0.1 and repeated the 23 factorial experiments on factors PP, SGLW, and SD to find the PBC for each rule in the FC and PC compliance modes. The complete data of these experiments are available in SimulationData/LGPP-CM directory.

Fig. 12 shows the resulting LGRD values corresponding to each compliance mode. As evident, the LGRD of the two CMs diverges as the rules get stricter, with the FC mode showing a higher LGRD. Considering the structure of our model, these results are not unexpected. In the FC mode, the adolescent’s compliance rises steadily as the probability of parental presence increases, which leads to a higher rate of executed LG actions within the main body of the simulation (where the parental rule still holds and the state space still changes). However, simply executing more LG actions does not necessarily lead to a higher LGRD after parental monitoring has lifted. The key here is the LG attractors, which in forced-choice cases, are updated by the value of the adolescent’s Perceived Autonomy (PA). Yet, in the current implementation, the PA only depends on the parental rule and presence and does not differ between compliance modes. Therefore, in cases of similar SG limits and presence probabilities, the FC mode performs better than the PC mode, as it enjoys higher compliance while having the same values of volitional feedback.


Figure 12: Self-control improvement (LGRD) vs. parental rules for different compliance modes (created using the code Fig_FCPC.R)


These results appear to be contrary to the main claims and other findings. However, this model is a simplified version of what could occur in reality. Fully complying with highly restrictive rules could bring about various psychological problems that could undermine the possible gains of acting toward long-term goals. Moreover, we could consider more complex compliance patterns, where stricter rules and more enforcement lead to defiance, or investigate different levels of PA.

To see the effect of the Perceived_Autonomy, we show the results of running the FC experiment with a much lower PA in Fig. 13. As you see, lowering the PA decreases the resulting LGRD. Note that although the lower PA results in shallower LG attractors as the rules become strict, a higher limit on the SG actions leads to even shallower SG attractors and, subsequently, a higher chance of selecting LG actions. As a result, after an initial decline, the LGRD increases as the rules become restrictive.


Figure 13: Self-control improvement (LGRD) vs. parental rules for different levels of perceived autonomy in the FC_P_PC_A compliance mode (created using the code Fig_FCFC.R)


3 References

Fleeson, W., & Jayawickreme, E. (2015). Whole trait theory. Journal of Research in Personality, 56, 82–92. https://doi.org/10.1016/j.jrp.2014.10.009
Law, A. M. (2015). Simulation modeling and analysis, FIFTH EDITION. www.averill-law.com
Lorscheid, I., Heine, B. O., & Meyer, M. (2012). Opening the ’black box’ of simulations: Increased transparency and effective communication through the systematic design of experiments. Computational and Mathematical Organization Theory, 18, 22–62. https://doi.org/10.1007/s10588-011-9097-3